to go along with
Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton
R for Data Science, 2nd edition by Wickham, Çetinkaya-Rundel, and Grolemund
geom_point()aes() functionwday.ggplot() function.aes() function.aes() function| year | Algeria | Brazil | Columbia |
|---|---|---|---|
| 2000 | 7 | 12 | 16 |
| 2001 | 9 | 14 | 18 |
| country | Y2000 | Y2001 |
|---|---|---|
| Algeria | 7 | 9 |
| Brazil | 12 | 14 |
| Columbia | 16 | 18 |
| country | year | value |
|---|---|---|
| Algeria | 2000 | 7 |
| Algeria | 2001 | 9 |
| Brazil | 2000 | 12 |
| Brazil | 2001 | 14 |
| Columbia | 2000 | 16 |
| Columbia | 2001 | 18 |
#(a)
babynames |>
group_by(year, sex) |>
summarize(totalBirths = sum(n))
#(b)
group_by(babynames, year, sex) |>
summarize(totalBirths = sum(n))
#(c)
group_by(babynames, year, sex) |>
summarize(totalBirths = mean(n))
#(d)
temp <- group_by(babynames, year, sex)
summarize(temp, totalBirths = sum(n))
#(e)
summarize(group_by(babynames, year, sex),
totalBirths = sum(n))filter()arrange()select()mutate()group_by()(year, sex)(year, name)(year, n)(sex, name)(sex, n)n_distinct(name)n_distinct(n)sum(name)sum(n)mean(n)library(babynames)
babynames |>
filter(name %in% c("Jane", "Mary")) |>
# just the Janes and Marys
group_by(name, year) |>
# for each year for each name
summarize(total = sum(n))# A tibble: 276 × 3
# Groups: name [2]
name year total
<chr> <dbl> <int>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
babynames |>
filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(number = sum(n))# A tibble: 276 × 3
# Groups: name [2]
name year number
<chr> <dbl> <int>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
babynames |>
filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(n_distinct(name))# A tibble: 276 × 3
# Groups: name [2]
name year `n_distinct(name)`
<chr> <dbl> <int>
1 Jane 1880 1
2 Jane 1881 1
3 Jane 1882 1
4 Jane 1883 1
5 Jane 1884 1
6 Jane 1885 1
7 Jane 1886 1
8 Jane 1887 1
9 Jane 1888 1
10 Jane 1889 1
# ℹ 266 more rows
babynames |>
filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(n_distinct(n))# A tibble: 276 × 3
# Groups: name [2]
name year `n_distinct(n)`
<chr> <dbl> <int>
1 Jane 1880 1
2 Jane 1881 1
3 Jane 1882 1
4 Jane 1883 1
5 Jane 1884 1
6 Jane 1885 1
7 Jane 1886 1
8 Jane 1887 1
9 Jane 1888 1
10 Jane 1889 1
# ℹ 266 more rows
Error in `summarize()`:
ℹ In argument: `sum(name)`.
ℹ In group 1: `name = "Jane"` and `year = 1880`.
Caused by error in `base::sum()`:
! invalid 'type' (character) of argument
# A tibble: 276 × 3
# Groups: name [2]
name year `mean(n)`
<chr> <dbl> <dbl>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
# A tibble: 276 × 3
# Groups: name [2]
name year `median(n)`
<chr> <dbl> <dbl>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
gdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countryggplot() code. Which data frame should you use?37
pivot_wider() on raw datapivot_longer() on raw data# A tibble: 18 × 11
Subject day_0 day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 308 250. 259. 251. 321. 357. 415. 382. 290. 431. 466.
2 309 223. 205. 203. 205. 208. 216. 214. 218. 224. 237.
3 310 199. 194. 234. 233. 229. 220. 235. 256. 261. 248.
4 330 322. 300. 284. 285. 286. 298. 280. 318. 305. 354.
5 331 288. 285 302. 320. 316. 293. 290. 335. 294. 372.
6 332 235. 243. 273. 310. 317. 310 454. 347. 330. 254.
7 333 284. 290. 277. 300. 297. 338. 332. 349. 333. 362.
8 334 265. 276. 243. 255. 279. 284. 306. 332. 336. 377.
9 335 242. 274. 254. 271. 251. 255. 245. 235. 236. 237.
10 337 312. 314. 292. 346. 366. 392. 404. 417. 456. 459.
11 349 236. 230. 239. 255. 251. 270. 282. 308. 336. 352.
12 350 256. 243. 256. 256. 269. 330. 379. 363. 394. 389.
13 351 251. 300. 270. 281. 272. 305. 288. 267. 322. 348.
14 352 222. 298. 327. 347. 349. 353. 354. 360. 376. 389.
15 369 272. 268. 257. 278. 315. 317. 298. 348. 340. 367.
16 370 225. 235. 239. 240. 268. 344. 281. 348. 365. 372.
17 371 270. 272. 278. 282. 279. 285. 259. 305. 351. 369.
18 372 269. 273. 298. 311. 287. 330. 334. 343. 369. 364.
sleep_long <- sleep_wide |>
pivot_longer(cols = -Subject,
names_to = "day",
names_prefix = "day_",
values_to = "reaction_time")
sleep_long# A tibble: 180 × 3
Subject day reaction_time
<dbl> <chr> <dbl>
1 308 0 250.
2 308 1 259.
3 308 2 251.
4 308 3 321.
5 308 4 357.
6 308 5 415.
7 308 6 382.
8 308 7 290.
9 308 8 431.
10 308 9 466.
# ℹ 170 more rows
right_join()?38right_join()?39namebandplaysplays variable in a full_join()?40NANULLgrep("q[^u]", very.large.word.list) would not match which of the following?49
"(?<=\\$)\\d""(?<=\\$)\\d+""\\d(?=\\$)""\\d+(?=\\$)""\\w+(?!pie)""\\w+(?! pie)""\\w+(?=pie)""\\w+(?= pie)"[1] "apple" "chocolate" "peach"
addTen() function. The following output is a result of which map_*() call?64map(c(1,4,7), addTen)map_dbl(c(1,4,7), addTen)map_chr(c(1,4,7), addTen)map_lgl(c(1,4,7), addTen)[1] "11.000000" "14.000000" "17.000000"
map(c(1, 4, 7), addTen)map(list(1, 4, 7), addTen)map(data.frame(a=1, b=4, c=7), addTen)map(c(1, 4, 7), addTen)map(c(1, 4, 7), ~addTen(.x))map(c(1, 4, 7), ~addTen)map(c(1, 4, 7), function(hi) (hi + 10))map(c(1, 4, 7), ~(.x + 10))jan31.months() is not a function.jan31.ymd() is not a function.jan31.ymd() is not a function. library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
jan31 <- ymd("2021-01-31")
jan31 + months(0:11) + days(31)
#> [1] "2021-03-03" NA "2021-05-01" NA "2021-07-01"
#> [6] NA "2021-08-31" "2021-10-01" NA "2021-12-01"
#> [11] NA "2022-01-31"ifelse() function takes the arguments:71set.seed() function75N(talent, 15)grades and SAT are to talent (bias?)replace = TRUE)replace = FALSE).#< >[ ]<img> (image) element?98
<img>.img#img[img]imghref= (URL) attribute?99
<href>href#href[href]hreftbl and R tibble both in storagetbl and R tibble both in memorytbl in storage and R tibble in memorytbl in memory and R tibble in storageSELECT Persons.FirstNameFROM PersonsSELECT FirstName FROM PersonsSELECT “FirstName” FROM “Persons”SELECT PersonsSELECT * FROM PersonsSELECT [all] FROM PersonsSELECT *.PersonsSELECT COLUMNS(*) FROM PersonsSELECT COUNT(*) FROM PersonsSELECT NO(*) FROM PersonsSELECT LEN(*) FROM PersonsSELECT * FROM Persons WHERE FirstName <> ‘Peter’SELECT * FROM Persons WHERE FirstName = ‘Peter’SELECT * FROM Persons WHERE FirstName == ‘Peter’SELECT * FROM Persons WHERE FirstName LIKE ‘Peter’SELECT [all] FROM Persons WHERE FirstName = ‘Peter’SELECT FirstName = ‘Peter’, LastName = ‘Jackson’ FROM PersonsSELECT * FROM Persons WHERE FirstName = Peter’ & LastName = Jackson’SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’SELECT * FROM Persons WHERE FirstName = Peter’ | LastName = Jackson’BEWTEENWITHINRANGESELECT LastName > ‘Hansen’ AND LastName < ‘Pettersen’ FROM PersonsSELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’SELECT * FROM Persons WHERE LastName > ‘Hansen’ AND LastName < ‘Pettersen’SELECT UNIQUESELECT DISTINCTSELECT DIFFERENTORDER BYORDERSORTSORT BYSELECT * FROM Persons ORDER FirstName DESCSELECT * FROM Persons SORT ‘FirstName’ DESCSELECT * FROM Persons ORDER BY FirstName DESCSELECT * FROM Persons SORT BY ‘FirstName’ DESCSELECT the records with foods that are either green or yellow fruit:118
WHERE type = ‘fruit’ AND color = ‘yellow’ OR color = ‘green’WHERE (type = ‘fruit’ AND color = ‘yellow’) OR color = ‘green’WHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)WHERE type = ‘fruit’ AND color = ‘yellow’ AND color = ‘green’WHERE type = ‘fruit’ AND (color = ‘yellow’ AND color = ‘green’)JOIN?119
SELECT statement.UNION operator in SQL?120
SELECT statements.SELECT statement.INNER JOIN in SQL?121
SELECT statement.LEFT JOIN in SQL?122
SELECT statement.RIGHT JOIN keeps all the rows in …?123
RIGHT JOIN?124
RIGHT JOIN?125
FULL JOIN?126
NULLSELECT * FROM Persons WHERE FirstName = ’a.*’SELECT * FROM Persons WHERE FirstName = ’a*’SELECT * FROM Persons WHERE FirstName REGEXP ’a.*’SELECT * FROM Persons WHERE FirstName REGEXP ’a*’SELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’shinyApp()createApp()runApp()startShinyApp()uiserverrunApp()shinyApp()selectInput()radioButtons()checkboxGroupInput()textInput()renderText() function?135ui component in a Shiny app represent?136reactive()render()observe()updateInput()renderPlot() in Shiny?138
sliderInput("slider", "Slider", min = 1, max = 100, value = 50)inputSlider("slider", min = 1, max = 100)sliderControl("slider", 1, 100)input_slider("slider", 1, 100)wherever you are, make sure you are communicating with me when you have questions!
wherever you are, make sure you are communicating with me when you have questions!
no right answer here!
Yes! All the responses are reasons to make a figure.
aes() functionwday.aes() functionanswers may vary. I’d say c. putting the work in context. Others might say b. facilitating comparison or d. simplifying the story. However, I don’t think a correct answer is a. making the data stand out.
mean() (average) instead of the sum(). The other commands compute the total number of births broken down by year and sex.filter()(year, name)sum(n)running the different code chunks with relevant output.
-countryyeargdpval (if possible, good idea to name variables something different from the name of the data frame)pivot_longer() on raw data. The reference to the study is: Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.NA (it would be NULL in SQL)str_replace() is vectorized)str_sub() is vectorized. So the subset of string one is from 1 to 2. The subset of string two is from 3 to 5.I don’t know what the answer is. Ill-defined question.
9 the second produces Sep)neither c. nor e. would match. Inside the bracket “[^u]” matches anything other than a “u”, but it has to match something.
| is a normal character and would therefore match “grey” and “gray” and “gr|y”. Which is not what we want, but would work to match both “grey” and “gray”.\d matches only a single digit).\d+ matches at least one digit).. matches anything, and returns only a single character).. matches anything, and with the + it returns multiple characters).\. matches the period, .)."(?<=\\$)\\d+""\\w+(?= pie)"map_chr(c(1,4,7), addTen) because the output is in quotes, the values are strings, not numbers.map() function allows vectors, lists, and data frames as input.map(c(1, 4, 7), ~addTen). The ~ acts on functions that do not have their own name or that are defined by function(...). By adding the argument (.x) we’ve expanded the addTen() function, and so it needs a ~. The addTen() function all alone does not use a ~.jan31.ymd() is not a function).all of the above
It totally depends on your personality and your finances. b. doesn’t make much sense. But a., c., and d. are all very reasonable questions to ask about your investments.
replace = FALSE)there can’t possibly be a right answer here.
< >img[href]tbl in storage and R tibble in memorySELECT FirstName FROM PersonsSELECT * FROM PersonsSELECT COUNT(*) FROM PersonsSELECT * FROM Persons WHERE FirstName = ‘Peter’ (d. would also work.)SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’BEWTEENSELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’SELECT DISTINCTORDER BYSELECT * FROM Persons ORDER BY FirstName DESCWHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)SELECT statements.NULL (it would be NA in R)SELECT * FROM Persons WHERE FirstName REGEXP ’(?i)a.*’ (n.b., the LIKE function will give you a similar result, with % as a wildcard: SELECT*FROMPersonsWHERE` FirstName LIKE ‘a%’)shinyApp()serverselectInput()sliderInput("slider", "Slider", min = 1, max = 100, value = 50)